perm filename INT.EX[TLK,DBL] blob sn#197260 filedate 1976-01-20 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00007 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	This is an example of how "interestingness" can be manipulated meaningfully.
C00006 00003	GETTING INTERESTED IN SETS
C00013 00004	GETTING INTERESTED IN EQUALITY
C00018 00005	ONWARD TO PRIMES
C00022 00006	A natural question that pervades math research is the following:
C00025 00007	INTERACTIONS BETWEEN INTERESTINGNESS FEATURES
C00031 ENDMK
C⊗;
This is an example of how "interestingness" can be manipulated meaningfully.


THE JOB LIST

There exists a global "job-list", whose entries are of the form
(<task> <reasons why it was suggested> <estimated worth>).
So the last component of each job is also an interestingness factor.

The "task" component usually has the form (do OP to the P part of concept C).
OP can be Fill-in, Check, Generalize, etc. 
P can be any facet name, like Examples, Generalizations, Definitions.
C can be the name of any known concept.

All interestingness factors range from 0 to 1000.
This really simulates 0-1, but INTERLISP handles ingers nicely, especially
small ones ( below 1500).

There are two threshholds pertaining to the job-list.
One says how interesting a job must be to be remembered; any jobs
  whose worth is below that number are deleted.
The other says how interesting a job must be to be executed; if no job
  has a worth above that number, then AM will pause and try to find new jobs.
These factors change dynamically as the system runs.

THE "WORTH" FACET

Let us assume that each concept C has attached to it a number Ic, representing
the interestingness of C. These factors can be accessed and changed dynamically.
In reality, in AM, each concept has a vector of a dozen numbers, representing
various aspects of interestingness, cost, past usefulness, right-to-life, etc.

These values are all set initially by hand, by me. There is no justification
for their starting values. Two weak arguments can be made:
(i) Since they are set by my introspection, they will be approximately
correct; even if they aren't, they will eventually stabilize at their true
values. OR: (ii) This is a bad idea, since the change in these values iwill
be slight and rare; meanwhile, they will drastically affect what is examined;
in other words, these starting values "prime" the system to make certain
discoveries and overlook others.  
Unfortunately, I can see evidence for both of these cases.

GETTING INTERESTED IN SETS

Assume that all concepts start out with interest rating about 200.

Initially, no jobs exist, so AM tries to find some. It comes up with
a job of the form (Fillin Examples of C) for each concept C, since no
examples are initially provided to AM.  These were suggested by a
heuristic on the Suggest facet of the Examples concept. It assigns a
mediocre worth rating (say .3 x Ic) to each such job.
AM picks one job to try first, say finding examples of UNION. 
All the heuristics from the following places are picked up and thrown
together into one large production system:
Fillin facet of Examples concept
Fillin facet of Any-concept concept
Any-facet facet of Union
Fillin facet of Examples facet of Union
Fillin facet of Examples facet of Operation
Fillin facet of Examples facet of Activity
Fillin facet of Examples facet of Any-concept

As they run, aany of them will ask for examples of the domain of
the operation. Since UNION woorks on sets, this means there will be
several requests for examples of Sets. But none exist. Each time
a request like this is made and not fulfilled, a new job is
created, namely a task of (FILLIN EXAMPLES of SETS). The reason is
"to expedite filling n exampls of UNION". The worth is given as
.3 x (worth of existing job), say in this case .3 x 60  = 18.
The first time this happens, AM discovers a job already on the
job-list, with the same task, but a different reason and worth,
namely "becuase no examples of Sets are known" and worth=60.
Since the reaason is different, the worth is boosted to
LARGER(60+18, (SQRT( 60x60 + 18x18))) = 78.
The second, third,... hundredth time this happens, the reason already
is in the set of reasons for this job, so the worth is just incremented
by 1. At the end of the current job, no examples of union exist,
but the worth of the (Fillin Examples of Sets) job is up to 200 or so.
Incidentally, the value of Sets was pushed up slightly eac time,
so it is now risen from 200 to 300.

So the very next job selected will be that one. AM is now trying to
fill in examples of Sets.
Some of the relevant heuristics have AM manipulate the defn of sets.
Another bunch ask if there are some interestingness features that Sets
can have. That is, are there some predicates which might or might not
hold for a given set, but which indicate that the set is interesting if they
do hold. Such predicates are precisely the INTEREST part of each concept.
Notice that any such predicate that applies to x will also apply to
any specialization of x, so in our case we consider all the predicats on
SETS.INT (i.e., the Interest facet of the Sets concept)
STRUCTURE.INT
OBJECT.INT
ANY-CONCEPT.INT

One of these says that a structure S is interesting if all members are
examples of some intersting concept D.
The interest of S is then estimated to be 
100 + .05(Int. of D)(SMALLER 10, (length of S))

Another of these says that a structure S is interesting if all pairs of members
satisfy some interesting predicate P(x,y).
The interest of S is then estimated to be 
100 + .01(Int. of P)(SQUARE (SMALLER 10, (length of S)))
The final estimate is the SQRT of the sums of the squares of these guesses.

Note that the second heuristic favors large structures more than the first one does.
But the average length of the sets it creates are not tiny.
Structure.Int also indicates that either, but rarely both, of these may be
appended onto the definition of a class of structures, to specialize it.

So AM creates a new concept, Int-Set, defined as any set all pairs of whose
elements satisfy the same interesting predicate P.
The interest of this set is computed as 
root-sum-of-squares of the 2 numvers:
Int(Set) and (100 + .01(Int of best predicate)(Avg length of set).
That is, 300 and (100 + .01(200)(10)); i.e., 300 and 120. This is about 350.

GETTING INTERESTED IN EQUALITY

Each time AM successfully completes a job, the threshholds of the
job-list are raised; say to 3/2 their old value.
Since AM has no job on its list now with interest over 60, it rapidly
comes to a point where no job is good enough to be executed.
At this time, two things happen. 
First, the threshholds are drastically lowered (to a quarter their old value),
and 2nd, AM pauses and asks each concept to suggest new jobs.

When this occurs, there is a new concept (Int-Sets), and Exs.Sugg suggests
filling in examples of Int-Sets, with worth estimated at .3x(int(Int-Sets))
which is .3x350= 105. This is much larger than any other job, hence is
executed next.

Each time AM finds an example of Int-Sets, it is really finding both a set
S and a predicte P, where all x and y in S satisfy P(x,y).
A heuristic says that each time, the interestingness of the P which is found
should be increased. Again, this is much more dramatic the very first time,
since after that the reason is the same.

In particular, AM finds examples of sets all pairs of whose elements are Equal.
E.g., {A}, { }, { {A,B,{C}} }. That is: singletons and the empty set.
Each time, the interest value of the Equal concept gets bumped upward.
B the end, it has risen from 200 to 300, say.

So of all the active concepts, Equal will be the first investigated.

Let's say that AM tries to find examples of things which are equal.
It tries random objects, and finds very few successes, say 2 out of 155 tries.
A heuristic knows that if this ratio is very low, then the predicate involved
might be too strict, too hard to satisfy; at least, a generalized version of it
might be just as worthwhile. So it adds the job
((Generalize the Defn of Equality) ("Equal is too strict") 335).
The 335 rating was computed by the heuristic as 
200 + .5(Current job's interest) + .3(Int. of the predicate involved)
which equals 200 + .5 x 90  +  .3 x 300 = 200 + 45 + 90.
The 200 is a mediocre "minimal" value; it indicates that we have a definite
reason for considering the eneralizations, and that reason alone is enough
to make the job at least 200-interesting. The next factor indicates that
the existing job can influence this; if, for example, we had a dynamite
reason for dealing with examples of Equals (e.g., direct command by user).
Finally, the value depends a little on the estimated overall interest of Equals.

This formula is possessed by the heuristic that suggests the job, and was
stuck in by me by hand. Again, there is no justification for it.
However, there are 2 rational arguments suporting it:
(1) I got these by introspection, beforehand, not via tuning/debugging.
(2) Big changes in the numbers involved don't make big changes in behavior.
For example, what's importnat was that AM raised the Equality concept AT ALL,
making it better than its fellow activities.

ONWARD TO PRIMES

This job just suggested, generalizing equality, leads to the creation of the
predicate "has-the-same-length-as", which is of course a monumental event.

It is a predicate whose interest factor is set to 400 initially.
Actually, its right-to-life depends on its having a much better ratio
of random exs to non-examples than Equal has (.015). In fact, its 
ratio is about 10 times as big, and this has the effect of stabilizing
its right to life and increasing its value to 500.
Again, the precise number a concept has is never very important.

Below I sketch how unique factorization is discovered. I'll dispense with
interestingness numbers for a while, since this is really tangential
to the main features of that concept. Afterwards, we'll pick up that
treatment on a "splinter" trail that AM follewed.

Canon.Sugg now can suggest the following job, when asked suggest some:
(Apply Algorithms of Canonize to the 2 arguments: Equal and Same-length).
This is actually asking that the system create a new operation f with the
property that for all structures S1 and S2,
Same-length(S1,S2) iff Equal(f(S1),f(S2)). Using ≡,= for these relations, this
means that we want S1≡S2 iff f(S1)=f(S2).
Canon.Algs should contain procedures for constructing this function f,
by manipulating the definitions of = and ≡.
In fact, it comes up with a function f which takes any structure, converts
it into a BAG, then replaces every element in it by the constant "T".
So f((CLASS a b c d)) = (BAG T T T T),
And f((LIST a a b d c)) = (BAG T T T T T).

What this really amounts to is counting the length of the structure in unary.
In orther words, a canonical BAG (with k T's) is the unary representation of
the number k. f maps all structures with k elements into that bag.

In addition to this function f, AM constructs a recursive definition
to tell when a bag is canonical, namely
CB(S) = (if S is the empty bag, then T,
		else, if the first element of S is T 
			AND CB(Remove-1st-ele(S))).
This implicitly contains the knowledge that the numbers are linearly ordered.

We can rename canonical bags "Numbers".
So f:STRUCTURES→NUMBERS
We shall refer to f from now on as "Counting". 

A natural question that pervades math research is the following:
Given functions on 3 sides of a "sqaure", find the missing fourth function:

PAIRS OF STRUCTURES →→→→→→→→(counting)→→→→→→→→→PAIRS OF NUMBERS
↓							↓
↓							↓
↓							↓
↓							↓
↓							↓
↓							↓
↓							↓
↓							↓
↓(cross-product)					↓(?)
↓							↓
↓							↓
↓							↓
↓							↓
↓							↓
↓							↓
↓							↓
STRUCTURES →→→→→→→→→→→→→→→→→(counting)→→→→→→→→→→→→→→→→NUMBERS

AM is easily able to compute this, for given paris of numbers.
Say we take 7 and 8. Then AM must find 2 structures with
7 and 8 elelemtns. Of course, in the canonical-bags represenation,
the numbers themselves are examples of such structures.
So AM can then feed (BAG T T T T T T T ) and (BAG T T T T T T T T )
directly to the function Cross-product, which creates a bag with
56 ordered pairs. The final application of counting yields a canonical
bag of that length: a bag of 56 T's. So the unknown function maps
(7,8) → 56. Sure enough, it turns out to be multiplication.

Heuristics later direct AM to consider its inverse, factoring, and
later to consider which numbers have an extreme number of factors
(very small or very large). In this way, primes and maximally-divisible
numbers are uncovered. It appears that most numbers decompose into
prime factors, and a relation is defined which associates to
any number, all its decompositions into prime factors, Then the
full unique factorization theorem is simply the statment that
that relation is a function.
INTERACTIONS BETWEEN INTERESTINGNESS FEATURES

Let's see what happens when AM decides to fill in some examples of
compostions. One way is to pick 2 operations and try to compose them.
The heuristic that suggests this also knows to suggest picking them from
the most interesting operations available.
If A and B are the chosen ops, then the interestingness of the
resultant combination is estimated at
(SQUARE-ROOT[Int(Composition) x Int(A) x Int(B) / 1,000,000,000] x 1000
Recall that each Int ranges from 0 to 1000, so their product of 3 will
range from 0 to 1,000,000,000. We divide by a billion to bring it to the 0-1
range, and square-rooting takes it into 0-1 again.
Note that it doesn't make a lot of difference if we take CURBE-ROOT of their
product, or any other reasonable function, as long as it satisfies:
(i) Increases with the Interest of each of the 3 concepts involved;
(ii) If any of them is very low in interest, then sois the result.
(iii) If all of them are average, then so is the result
(iv)  If all of them are slightly above average, then the result is very high.
(v) Monotone nondecreasing, piecewise continuous mapping of [0,1] into itself.


Since some of our best concepts are constructed as compositions, we must
have their constituents rated as good operations by themselves, if the
above heuristic is to pick them out and try to compose them.

There are other heuristics for picking the two operations, involving such
things as the meshing of domain/range of each, the intuitions that each
possesses, the reason that each was created, etc.

Unfortunately, this is the one serious flaw in the "unimportance of precise
values" claim. Right here, relatively small differences in Int. values
can cause large changes in the estimated worth of the operation for
composing. (e.g., the heuristic above not only uses the factors multiplicatively,
it roots the result, thereby exaggerating any differences even more).

If we want AM to consider the composition Count(Factors-of(x)), which is
crucial at some point if we are to notice primes, then either the user
must tell AM to consider it, or the user must say that these two operations
are very interesting in their own right, or AM must decide that for itself
(which it does, but that required a little tuning, which is not a good thing!!),
or the intuitions for  them must mesh (which they didn't in practice, although
one could modify AM so that they did, but this is tuning again!).

The intuition for multiplicaation is the intuition for cross-product, essentially.
For divisors-of, it is simply the inverse of cross-product. If divisors-of
had been a primitive concept, it would be legal for me to give it a more
meaningful intuition (like breaking apart into pieces).
The intuition for counting is that of substitution (T for anything);
again, if I had put in Counting as a primitive, that is certainly NOT the
image that I would have given; I would have put in a notion of
studying a bunch of pieces of data, and combining the results into one datum.
In that way, the two hand-inserted intuitions (for counting and for divisors-of)
would have meshed, and the combined image of counting the divisors-of
would have been judged aesthetically appealing. 

This points out a serious problem, one of the biggest handicaps that any
mechanized system will face: developing new intutions for the new concepts.
The mechanism that AM has can do only trivial combinations of and alterations
of the initial set of intuitions. This is because AM does not have a huge
world model of experiences to draw upon. It would be interesting to
try to create enough of an external world model that new intuitions could
be synthesized from these simulated occurrences. This is quite nontrivial.